Photo sharing and photo storage services like to have location data for each photo that is uploaded. With the location data, these services can build advanced features, such as automatic suggestion of relevant tags or automatic photo organization, which help provide a compelling user experience. Although a photo's location can often be obtained by looking at the photo's metadata, many photos uploaded to these services will not have location metadata available. This can happen when, for example, the camera capturing the picture does not have GPS or if a photo's metadata is scrubbed due to privacy concerns.
If no location metadata for an image is available, one way to infer the location is to detect and classify a discernable landmark in the image. Given the large number of landmarks across the world and the immense volume of images that are uploaded to photo sharing services, using human judgement to classify these landmarks would not be feasible.
In this notebook, we will take the first steps towards addressing this problem by building models to automatically predict the location of the image based on any landmarks depicted in the image.
We break the notebook into separate steps. Feel free to use the links below to navigate the notebook.
Install the following Python modules:
In this step, we will create a CNN that classifies landmarks. We aim an accuracy above 20%.
Although 20% may seem low at first glance, it seems more reasonable after realizing how difficult of a problem this is. Many times, an image that is taken at a landmark captures a fairly mundane image of an animal or plant, like in the following picture.

Just by looking at that image alone, would you have been able to guess that it was taken at the Haleakalā National Park in Hawaii?
An accuracy of 20% is significantly better than random guessing, which would provide an accuracy of just 2%. In Step 2 of this notebook, we will improve accuracy by using transfer learning to create a CNN.
Dataset can be found at /data/landmark_images/ in the workspace.
# Install splitfolders to split train data into train and valid
!pip install split-folders
import splitfolders
# Split train data to train and valid
splitfolders.ratio("/data/landmark_images/train", output="train_valid", seed=1337, ratio=(.8, .2), group_prefix=None)
import os
def fn(): # 1.Get file names from directory
file_list=os.listdir(r"train_valid")
print (file_list)
fn()
import torch
from torchvision import transforms, datasets, models
from torch.utils.data.sampler import SubsetRandomSampler
import numpy as np
loaders_scratch = {'train': None, 'valid': None, 'test': None}
# Batch size and number of workers
batch_size = 64
num_workers = 0
# Define transformers
transform_train = transforms.Compose([transforms.Resize(260),
transforms.CenterCrop(256),
transforms.RandomHorizontalFlip(),
transforms.RandomRotation(20),
transforms.ToTensor(),
transforms.Normalize([0.5, 0.5, 0.5],
[0.5, 0.5, 0.5])
])
transform_test_valid = transforms.Compose([transforms.Resize(260),
transforms.CenterCrop(256),
transforms.ToTensor(),
transforms.Normalize([0.5, 0.5, 0.5],
[0.5, 0.5, 0.5])])
# Get the data
train_data = datasets.ImageFolder("train_valid/train", transform= transform_train)
valid_data = datasets.ImageFolder("train_valid/val", transform = transform_test_valid)
test_data = datasets.ImageFolder("/data/landmark_images/test", transform = transform_test_valid)
# Get data loaders
loaders_scratch["train"] = torch.utils.data.DataLoader(train_data,
batch_size = batch_size,
num_workers = num_workers,
shuffle = True)
loaders_scratch["valid"] = torch.utils.data.DataLoader(valid_data,
batch_size = batch_size,
num_workers = num_workers)
loaders_scratch["test"] = torch.utils.data.DataLoader(test_data,
batch_size = batch_size,
num_workers = num_workers)
classes = [label[3:] for label in train_data.classes]
classes
Description:
Visualizing the output of our data loader to ensure that our data loading and preprocessing are working as expected.
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
# visualize a batch of the train data loader
dataiter = iter(loaders_scratch["train"])
images, labels = dataiter.next()
images = images.numpy().transpose(0, 2, 3, 1) * 0.5 + 0.5
fig = plt.figure(figsize = (27, 10))
for i in range(min(len(images), 32)):
ax = fig.add_subplot(4, min(len(images), 32)/4, i+1, xticks = [], yticks=[])
ax.imshow(images[i])
ax.set_title(classes[labels[i]])
# useful variable that tells us whether we should use the GPU
use_cuda = torch.cuda.is_available()
Specify Loss Function and Optimizer
## select loss function
criterion_scratch = torch.nn.CrossEntropyLoss()
def get_optimizer_scratch(model):
## select and return an optimizer
return torch.optim.SGD(model.parameters(), lr = 0.1)
Create a CNN to classify images of landmarks.
import torch.nn as nn
import torch.nn.functional as F
# define the CNN architecture
class Net(nn.Module):
## choose an architecture, and complete the class
def __init__(self):
super(Net, self).__init__()
## Define layers of a CNN
self.conv1 = nn.Conv2d(3, 16, 3, padding = 1)
self.conv2 = nn.Conv2d(16, 64, 3, padding = 1)
self.conv3 = nn.Conv2d(64, 256, 3, padding = 1)
self.pool = nn.MaxPool2d(2, 2)
self.fc1 = nn.Linear(256 * 32 * 32, 256)
self.fc2 = nn.Linear(256, 128)
self.fc3 = nn.Linear(128, 50)
self.dropout = nn.Dropout(0.2)
def forward(self, x):
## Define forward behavior
x = F.relu(self.conv1(x)) # 16 256 256
x = self.pool(x) # 16 128 128
x = F.relu(self.conv2(x)) # 64 128 128
x = self.pool(x) # 64 64 64
x = F.relu(self.conv3(x)) # 256 64 64
x = self.pool(x) # 256 32 32
# Flatten
x = x.view(x.shape[0], -1)
# Fully-connected
x = F.relu(self.fc1(x))
x = self.dropout(x)
x = F.relu(self.fc2(x))
x = F.dropout(x)
x = self.fc3(x)
return x
# instantiate the CNN
model_scratch = Net()
# move tensors to GPU if CUDA is available
if use_cuda:
model_scratch.cuda()
Description:
- increasing the depth as we go deeper
- Keep the parameters as low as possible
I wasn't sure to add dropout layers since the task is too complex for this model to overfit, but in the end, I decided to err on the side of caution and use them.
After trying that, I realized that the model didn't give me the expected result; hence, I reduced the number of layers, so I gradually reduced the number of layers and blocks, and finally, I decided on the current architecture.
My final architecture turned out to be much less that the first one! Only three conv layers.
def train(n_epochs, loaders, model, optimizer, criterion, use_cuda, save_path):
"""returns trained model"""
# initialize tracker for minimum validation loss
valid_loss_min = np.Inf
# add learning-rate scheduler
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode = "min", factor= 0.5, patience=4)
for epoch in range(1, n_epochs+1):
# initialize variables to monitor training and validation loss
train_loss = 0.0
valid_loss = 0.0
###################
# train the model #
###################
# set the module to training mode
model.train()
for batch_idx, (data, target) in enumerate(loaders['train']):
# move to GPU
if use_cuda:
data, target = data.cuda(), target.cuda()
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
# sequential estimation
## record the average training loss, using
## train_loss = train_loss + ((1 / (batch_idx + 1)) * (loss.data.item() - train_loss))
train_loss += (1/(batch_idx+1)) * (loss.item() - train_loss)
######################
# validate the model #
######################
# set the model to evaluation mode
model.eval()
for batch_idx, (data, target) in enumerate(loaders['valid']):
# move to GPU
if use_cuda:
data, target = data.cuda(), target.cuda()
# update average validation loss
output = model(data)
loss = criterion(output, target)
valid_loss += (1/(batch_idx+1))*(loss.item()-valid_loss)
# print training/validation statistics
print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(
epoch,
train_loss,
valid_loss
))
scheduler.step(valid_loss)
## if the validation loss has decreased, save the model at the filepath stored in save_path
if valid_loss_min>valid_loss:
torch.save(model.state_dict(), save_path)
valid_loss_min = valid_loss
return model
Defining a custom weight initialization, and then train with our weight initialization for a few epochs:
def custom_weight_init(m):
classname = m.__class__.__name__
if classname.find("Linear") != -1:
m.weight.data.normal_(0, m.in_features**-0.5)
m.bias.data.fill_(0)
model_scratch.apply(custom_weight_init)
model_scratch = train(20, loaders_scratch, model_scratch, get_optimizer_scratch(model_scratch),
criterion_scratch, use_cuda, 'ignore.pt')
num_epochs = 30
# function to re-initialize a model with pytorch's default weight initialization
def default_weight_init(m):
reset_parameters = getattr(m, 'reset_parameters', None)
if callable(reset_parameters):
m.reset_parameters()
# reset the model parameters
model_scratch.apply(default_weight_init)
# train the model
model_scratch = train(num_epochs, loaders_scratch, model_scratch, get_optimizer_scratch(model_scratch),
criterion_scratch, use_cuda, 'model_scratch.pt')
def test(loaders, model, criterion, use_cuda):
# monitor test loss and accuracy
test_loss = 0.
correct = 0.
total = 0.
# set the module to evaluation mode
model.eval()
for batch_idx, (data, target) in enumerate(loaders['test']):
# move to GPU
if use_cuda:
data, target = data.cuda(), target.cuda()
# forward pass: compute predicted outputs by passing inputs to the model
output = model(data)
# calculate the loss
loss = criterion(output, target)
# update average test loss
test_loss = test_loss + ((1 / (batch_idx + 1)) * (loss.data.item() - test_loss))
# convert output probabilities to predicted class
pred = output.data.max(1, keepdim=True)[1]
# compare predictions to true label
correct += np.sum(np.squeeze(pred.eq(target.data.view_as(pred))).cpu().numpy())
total += data.size(0)
print('Test Loss: {:.6f}\n'.format(test_loss))
print('\nTest Accuracy: %2d%% (%2d/%2d)' % (
100. * correct / total, correct, total))
# load the model that got the best validation accuracy
model_scratch.load_state_dict(torch.load('model_scratch.pt'))
test(loaders_scratch, model_scratch, criterion_scratch, use_cuda)
As you can see, we reached an accuracy of 32% which is way better than expected.
loaders_transfer = {'train': None, 'valid': None, 'test': None}
# Batch size and number of workers
batch_size = 128
num_workers = 0
# Define transformers
transform_train = transforms.Compose([transforms.Resize(260),
transforms.CenterCrop(256),
transforms.RandomHorizontalFlip(),
transforms.RandomRotation(20),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225])
])
transform_test_valid = transforms.Compose([transforms.Resize(260),
transforms.CenterCrop(256),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225])])
# Get the data
train_data = datasets.ImageFolder("train_valid/train", transform= transform_train)
valid_data = datasets.ImageFolder("train_valid/val", transform = transform_test_valid)
test_data = datasets.ImageFolder("/data/landmark_images/test", transform = transform_test_valid)
# Get data loaders
loaders_transfer["train"] = torch.utils.data.DataLoader(train_data,
batch_size = batch_size,
num_workers = num_workers,
shuffle = True)
loaders_transfer["valid"] = torch.utils.data.DataLoader(valid_data,
batch_size = batch_size,
num_workers = num_workers)
loaders_transfer["test"] = torch.utils.data.DataLoader(test_data,
batch_size = batch_size,
num_workers = num_workers)
## select loss function
criterion_transfer = torch.nn.CrossEntropyLoss()
def get_optimizer_transfer(model):
## select and return optimizer
return torch.optim.SGD(model.fc.parameters(), lr = 0.1)
I decided on resnet50 based on three observations:
## Specify model architecture
model_transfer = models.resnet50(pretrained = True)
for param in model_transfer.parameters():
param.requires_grad = False
model_transfer.fc = nn.Linear(8192, 50)
use_cuda = torch.cuda.is_available()
if use_cuda:
model_transfer = model_transfer.cuda()
model_transfer
# train the model and save the best model parameters at filepath 'model_transfer.pt'
epochs = 30
model_transfer = train(epochs, loaders_transfer, model_transfer, get_optimizer_transfer(model_transfer),
criterion_transfer, use_cuda, 'model_transfer.pt')
# load the model that got the best validation accuracy
model_transfer.load_state_dict(torch.load('model_transfer.pt'))
test(loaders_transfer, model_transfer, criterion_transfer, use_cuda)
As you can see, we reached an accuracy of 73% which is way better than expected.
import seaborn as sns
from PIL import Image
device = "cuda" if torch.cuda.is_available() else "cpu"
model_transfer.load_state_dict(torch.load('model_transfer.pt', map_location=device ))
def predict_landmarks(img_path, k):
## return the names of the top k landmarks predicted by the transfer learned CNN
image = Image.open(img_path).convert("RGB")
transform = transforms.Compose([transforms.Resize(260),
transforms.CenterCrop(256),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225])]
)
image = transform(image).unsqueeze(0)
output = model_transfer.cpu().eval()(image)
confidence, top_k_predictions = output.topk(k, dim =1)
top_k_predictions = top_k_predictions.squeeze()
return confidence, [classes[ind] for ind in top_k_predictions]
# test on a sample image
predict_landmarks('images/test/09.Golden_Gate_Bridge/190f3bae17c32c37.jpg', 5)[1]
Implementing the function suggest_locations, which accepts a file path to an image as input, and then displays the image and the top 3 most likely landmarks as predicted by predict_landmarks.
def suggest_locations(img_path, count = 0):
# get landmark predictions
confidence, predicted_landmarks = predict_landmarks(img_path, 3)
# Change confidence scores to probabilities
confidence = confidence[0].detach().numpy()
confidence = np.exp(confidence)/np.exp(confidence).sum()
## display image and display landmark predictions
image= np.array(Image.open(img_path).convert("RGB"))
plt.figure(figsize=(21, 8))
ax = plt.subplot(1, 2, 1)
sns.barplot(x = confidence, y=predicted_landmarks, color=sns.color_palette()[0])
ax.set_title("Probabilities", y =-0.14, fontsize = 14)
ax = plt.subplot(1, 2, 2)
ax.imshow(image)
ax.set_title(f"Is this picture of the \n{predicted_landmarks[0]}, {predicted_landmarks[1]}, or {predicted_landmarks[2]} ?", y= -0.20,
fontsize = 16)
ax.set_xticks([])
ax.set_yticks([])
plt.savefig(f"images/result{count}.png", facecolor = "white")
plt.show()
# test on a sample image
suggest_locations('images/test/09.Golden_Gate_Bridge/190f3bae17c32c37.jpg')
Running the suggest_locations function on images on my computer.
import os
img_paths=os.listdir(r"user_images")
for i,path in enumerate(img_paths):
if path.find("checkpoints")==-1:
suggest_locations(f"user_images/{path}", i)
Description:
It was better than I expected because I thought the model acted very weird during the training based on the validation loss. I think these are the possible improvements:
Adam instead of SGD or a different learning rate. Maybe the oscillation in loss is because of my high initial learning rate, though I had the lr-scheduler.